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We consider a switched (queueing) network in which there are 
constraints on which queues may be served simultaneously; such net- 
works have been used to effectively model input-queued switches and 
wireless networks. The scheduling policy for such a network specifies 
which queues to serve at any point in time, based on the current state 
or past history of the system. In the main result of this paper, we 
provide a new class of online scheduling policies that achieve optimal 
average queue-size scaling for a class of switched networks including 
input-queued switches. In particular, it establishes the validity of a 
conjecture (documented in [25]) about optimal queue-size scaling for 
input-queued switches. 



1. Introduction. A switched network consists of a collection of, say N , queues, 
operating in discrete time. At each time slot, queues are offered service according 
to a service schedule chosen from a specified finite set, denoted by S. The rule for 
choosing a schedule from S at each time slot is called the scheduling policy. New 
work may arrive to each queue at each time slot exogenously and work served from a 
queue may join another queue or leave the network. We shall restrict our attention, 
however, to the case where work arrives in the form of unit-sized packets, and once 
it is served from a queue, it leaves the network, i.e., the network is single-hop. 

Switched networks are special cases of what Harrison [12] calls "stochastic process- 
ing networks" . Switched networks are general enough to model a variety of interest- 
ing applications. For example, they have been used to effectively model input-queued 
switches, the devices at the heart of high-end Internet routers, whose underlying sil- 
icon architecture imposes constraints on which traffic streams can be transmitted 
simultaneously [8]. They have also been used to model multihop wireless networks 
in which interference limits the amount of service that can be given to each host 
[31]. Finally, they can be instrumental in finding the right operational point in a 
data center [27] . 

In this paper, we consider online scheduling policies, that is, policies that only 
utilize historical information (i.e., past arrivals and scheduling decisions). The per- 
formance objective of interest is the total queue size or total number of packets 
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waiting to be served in the network on average (appropriate defined). The questions 
that we wish to answer are: (a) what is the minimal value of the performance ob- 
jective among the class of online scheduling policies, and (b) how does it depend on 
the network structure, S, as well as the effective load. 

Consider a work-conserving M/D/1 queue with a unit-rate server in which unit- 
sized packets arrive as a Poisson process with rate p G (0,1). Then, the average 
queue size scales^ as 1/(1 — p). Such scaling dependence of the average queue size 
on 1/(1 — /o) (or the inverse of the gap, 1 — p, from the load to the capacity) is a 
universally observed behavior in a large class of queueing networks. In a switched 
network, the scaling of the average total queue size ought to depend on the number 
of queues, A^. For example, consider N parallel M/D/1 queues as described above. 
Clearly, the total average total queue size will scale as A^/(l — p). On the other 
hand, consider a variation where all of these queues pool their resources into a 
single server that works N times faster. Equivalently, by a time change, let each of 
the N queues receive packets as an independent Possion process of rate p/N, and 
each time a common unit-rate server serves a packet from one of the non-empty 
queues. Then, the average total queue size scales as 1/(1 — p). Indeed, these are 
instances of switched networks that differ in their scheduling set S, which leads to 
different queue-size scalings. Therefore, a natural question is the determination of 
average queue-size scaling in terms of S and (1 — p), where p is the effective load. 
In the context of an n-port input-queued switch with N = ri?' queues, the optimal 
scaling of average total queue size has been conjectured to be n/(l — p), that is, 
VN/{1-p) [25]. 

As the main result of this paper, we propose a new online scheduling policy 
for any single-hop switched network. This policy effectively emulates an insensitive 
bandwidth sharing network with a product-form stationary distribution with each 
component of this product-form behaving like an M/M/1 queue. This crisp descrip- 
tion of stationary distribution allows us to obtain precise bounds on the average 
queue sizes under this policy. This leads to establishing, as a corollary of our result, 
the validity of the conjecture stated in [25] for input-queued switches. In general, it 
provides explicit bounds on the average total queue size for any switched network. 
Furthermore, due to the explicit bound on the stationary distribution of queue sizes 
under our policy, we are able to establish a form of large-deviations optimality of 
the policy for any single-hop switched network. 

We note that the validity of the conjecture in [25] for input-queued switches, 
stating that optimal average total queue size scales as ^fN /(I — p), is a significant 
improvement over the best known bounds of 0{N/{\ — p)) (due to the moment 

^In this paper, by scaling of quantity we mean its dependence (ignoring universal constants) on 
and/or the number of queues, A'^, as these quantities become large. Of particular interest is the 
scaling of p 1 and N oo, in that order. 
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bounds of [20] for the maximum weig ht policy) or 0{\^logN/{l - pf) (obtained 
by using a batching pohcy [21]). 

Our analysis consists of two principal components. Firstly, a scheduling mech- 
anism that is able to emulate, in discrete time, any continuous-time bandwidth 
allocation within a bounded degree of error. This scheduler maintains a continuous- 
time queueing process and tracks its own queue size process. If, valued under a 
certain decomposition, the gap between the idealized continuous-time process and 
the real queueing process becomes too large then an appropriate schedule is allo- 
cated. Secondly, we implement specific bandwidth allocation named the store-and- 
forward allocation policy (SFA). This policy was first considered by Massoulie, and 
was consequently discussed in the thesis of Proutiere [22, Section 3.4]. It was shown 
to be insensitive with respect to phase-type service distributions in works by Bonald 
and Proutiere [3, 4]. The insensitivity of this policy for general service distributions 
was established by Zachary [37]. The Store-and- Forward bandwidth allocation pol- 
icy is closely related to classical product-form multiclass queueing network, which 
have highly desirable queue-size scalings. By emulating these queueing networks, we 
are able to translate results which render optimal queue-size bounds for a switch 
network. 

1.1. Organization In Section 2, we specify a stochastic switched network model. 
In Section 3, we discuss related works. Section 4 details the necessary background 
on the insensitive store-and-forward bandwidth allocation (SFA) policy. The main 
result of the paper is presented and proved in Section 5. We first describe the policy 
for single-hop switched networks, and state our main result. Theorem 5.2. This is 
followed by a discussion of the optimality of the policy. We then provide a proof of 
Theorem 5.2. A discussion of directions for future work is provided in Section 6. 

Notation. Let N be the set of natural numbers {1, 2, . . . }, let Z+ = {0, 1,2,...}, 
let M be the set of real numbers, and let M4. = {x G M : x > 0}. Let '^[A] be the 
indicator function of an event A, Let x A y = min(x,?/), x V y = max(x,y), and 
[x]^ = X V 0. When x is a vector, the maximum is taken componentwise. 

We will reserve bold letters for vectors in R^, where N is the number of queues. 
For example, x = [x„]i<.„<Ar. Superscripts on vectors are used to denote labels, not 
exponents, except where otherwise noted; thus, for example, (x'',x^,x^) refers to 
three arbitrary vectors. Let be the vector of all Os, and 1 be the vector of all 
Is. The vector e, is the ith unit vector, with all components being but the ith 
component equal to 1. We use the norm |x| = max^ \ xn\- For vectors u and v, and 
functions / : M — )■ R, we let 



N 




n=l 
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and let matrix multiplication take precedence over dot product so that 

u- Av = u-(Av) . 

Let a"*" be the transpose of matrix A. For a set S C M^, denote its convex hull by 
(S). 

2. Switched network model. We now introduce the switched network model. 
Section 2.1 describes the general system model, Section 2.2 lists the probabilistic 
assumptions about the arrival process, and Section 2.3 introduces some useful defi- 
nitions. 

2.1. Queueing dynamics. Consider a collection of queues. Let time be discrete, 
and indexed by r G {0,1,...}. Let Qi(r) be the amount of work in queue i € 
{1, . . . , at time slot r. Following our general notation for vectors, we write Q(r) 
for [Qi{T)]i<i<N . The initial queue sizes are Q(0). Let Ai{T) be the total amount of 
work arriving to queue i, and Bi[T) be the cumulative potential service to queue n, 
up to time r, with A(0) = B(0) = 0. 

We first define the queueing dynamics for a single-hop switched network. Defining 
(iA(r) = A(r -|- 1) - A(r) and (iB(r) = B(r + 1) - B(r), the basic Lindley recursion 
that we will consider is 

(1) 

Q(r + 1) = [Q(r) - dB(r)] + + dA(r) 

where the operation [-J^ is applied componentwise. The fundamental switched net- 
work constraint is that there is some finite set S C such that 
(2) 

dB(r) e S, for all r. 

For the purpose of this work, we shall focus on 5 C {0, 1}^. We will refer to <t G 5 
as a schedule, and S as the set of allowed schedules. In the applications in this paper, 
the schedule is chosen based on current queue sizes, which is why it is natural to 
write the basic Lindley recursion as (1) rather than the more standard [Q(t) + 

dA{T) - dB(T)] + . 

For the analysis in this paper, it is useful to keep track of two other quantities. 
Let Ziij) be the cumulative amount of idling at queue n, defined by Z(0) = and 

(3) (iZ(T) = [(iB(r)-Q(r)] + , 

where (iZ(T) = Z(t + 1) — Z(r). Then, (1) can be rewritten as 

(4) Q(r) = Q(0) + A(r) - B(r) + Z(r). 
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Also, let Scrir) be the cumulative amount of time that is spent on using schedule cr 

up to time r, so that 

(5) 

B(r) = SUr)<T. 
a-es 

A policy that decides which schedule to choose at each time slot r E Z+ is called 
a scheduling policy. In this paper, we will be interested in online scheduling policies. 
That is, the scheduling decision at time r will be based on historical information, 
i.e., the cumulative arrival process A(-) till time r. 

2.2. Stochastic model. We shall assume that the exogeneous arrival process for 
each queue is independent and Poisson. Specifically, unit-sized packets arrive to 
queue i as a Poisson process of rate Aj. Let A = [Aj]^^ denote the vector of all 
arrival rates. The results presented in this paper extend to more general arrival 
process with i.i.d. interarrival times with finite means, using a Poissonization trick. 
We discuss this extension in Section 6. 

2.3. Useful quantities. We shall assume that the scheduling constraint set S is 
monotone. This is captured in the following assumption. 

Assumption 2.1 (Monotonicity) If S contains a schedule, then S also contains 
all of its sub-schedules. Formally, for any a £ S, if a' £ {0, 1}^ and a' < cr 
component-wise, then cr' £ S. 

Without loss of generality, we will assume that each unit vector belongs to S. 
Next, we define some quantities that will be useful in the remainder of the paper. 

Definition 2.2 (Admissible region) Let S C {0, 1}^ be the set of allowed sched- 
ules. Let {S) be the convex hull of S, i.e., 

{S) = aa-cr : = 1, and > for all cr|. 

(tg5 a-eS 

Define the admissible region C to be 

C = |A G M.^ : \ < cr componentwise, for some a G 

Note that under Assumption 2.1, the capacity region C and the convex hull (S) of 
S coincide. 

Given that (5) is a polytope contained in [0, 1]^, there exists an integer J > 1, a 
matrix R G M:j^^^, and a vector C G such that 

(6) (5) = {xG [0,1]^ :Rx< C}. 
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We call J the rank of {S) in the representation (6). When it is clear from the context, 
we simply call J the rank of (S). Note that this rank may be different from the rank 
of matrix R. Our results will exploit the fact that the rank J may be an order of 
magnitude smaller than A^. 

Definition 2.3 (Static planning problems and load) Define the static planning 
optimization problem PRIMAL (A) for A G to be 



Oct 



(7) minimize 

(8) subject to A < a^a, 

a-es 

(9) ao- G M+, for all a £ S. 



Define the induced load by A, denoted by p{X), as the value of the optimization 
problem PRIMAL (A). 

Note that A is admissible if and only if /o(A) < 1. It also follows immediately from 
Definition 2.3 that 

p(A) = inf|7>0:RA<7c}, 

and A is admissible if and only if RA < C, component- wise. 

The following is a simple and useful property of /?(•): for any a, b G 

(10) p(a + b) <p(a) + p(b). 



2.4. Motivating example. An Internet router has several input ports and output 
ports. A data transmission cable is attached to each of these ports. Packets arrive 
at the input ports. The function of the router is to work out which output port each 
packet should go to, and to transfer packets to the correct output ports. This last 
function is called switching. There are a number of possible switch architectures; we 
will consider the commercially popular input-queued switch architecture. 

Figure 1 illustrates an input-queued switch with three input ports and three out- 
put ports. Packets arriving at input k destined for output i are stored at input port 
k, in queue Qk,i, thus there are = 9 queues in total. (For this example, it is more 
natural to use double indexing, e.g., Q3,2; whereas for general switched networks it 
is more natural to use single indexing, e.g., Qi for 1 < i < N.) 

The switch operates in discrete time. At each time slot, the switch fabric can 
transmit a number of packets from input ports to output ports, subject to the two 
constraints that each input can transmit at most one packet, and that each output 
can receive at most one packet. In other words, at each time slot the switch can 
choose a matching from inputs to outputs. The schedule cr G M.^^ is given by 
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input 1 




output 1 output 2 output 3 



Output ports 

Fig 1. An input-queued switch, and two example matchings of inputs to outputs. 



= 1 if input port k is matched to output port £ in a given time slot, and a^/ = 
otherwise. The matching constraints require that Ylm=i^k,rn < 1 for k = 1,2,3, 
and Ylrn=i'^rn,£ < 1 for ^ = 1,2,3. Figure 1 shows two possible matchings. On the 
left-hand side, the matching allows a packet to be transmitted from input port 3 to 
output port 2, but since Q3^2 is empty, no packet is actually transmitted. 

In general, for an n-port switch, there are N = "n? queues. The corresponding 
schedule set S is defined as 

n n 

cS= {cTG {0,1}"^" : ^afc,„< 1, ^ < 1, l<A:,^<n}. 

m=l m=l 

It can be checked that S is monotone. Furthermore, due to Birkhoff-von Neumann 
Theorem, [2, 33], the convex hull of S is given by 

n n 

(5) = {xG [0,ir": j;xfc,„<l, 5];x^,,<l, l<k,i<nY 

m=l m=l 

That is, the rank of {S) is less than or equal to 2n = 2y/N for an n-port switch. 
Finally, given an arrival rate matrix^ A G [0, 1]'"^", is given by 



n n 



m=l m=l 



3. Related works. The question of determining the optimal scaling of queue 
sizes in switched networks, or more generally, stochastic processing networks, has 
been an important intellectual pursuit for more than a decade. The complexity of 
the generic stochastic processing network makes this task extremely challenging. 

^not a vector, for notational convenience, as discussed earlier 
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Therefore, in search of tractable analysis, most of the prior work has been on trying 
to understand optimal scaling and scheduling policies for scaled systems: primarily, 
with respect to fluid and heavy-traffic scaling. 

In heavy-traffic analysis, one studies the queue-size behavior under a diffusion 
(or heavy-traffic) scaling. This regime was first considered by Kingman [18]; since 
then, a substantial body of theory has developed, and modern treatments can be 
found in [5, 11, 35, 36]. Stolyar [30] has studied a class of myopic scheduling policies, 
known as the maximum weight policy, introduced by Tassiulas and Ephremides [31], 
for a generalized switch model in the diffusion scaling. In a general version of the 
maximum weight policy, a schedule with maximum weight is chosen at each time 
step, with the weight of a schedule being equal to the sum of the weights of the 
queues chosen by that schedule. The weight of a queue is a function of its size. 
In particular, for the choice of one parameter class of functions parameterized by 
a > 0, f{x) = the resulting class of policies are called the maximum weight 
policies with parameter a > 0, and denoted as MW-a. 

In [30] , a complete characterization of the diffusion approximation for the queue- 
size process was obtained, under a condition known as "complete resource pooling", 
when the network is operating under the MW-a policy, for any a > 0. This condition 
effectively requires that there exists a scheduling policy which is able to balance 
the weights of all the heavily loaded queues. Stolyar [30] showed the remarkable 
result that the limiting queue-size vector lives in a one-dimensional state space. 
Operationally, this means that all one needs to keep track of is the one-dimensional 
total amount of work in the system (called the resettled workload), and at any point in 
time one can assume that the individual queues have all been balanced. Furthermore, 
it was established that a max-weight policy minimizes the rescaled workload induced 
by any policy under the heavy-traffic scaling (with complete resource pooling). Dai 
and Lin [6, 7] have established that a similar result holds (with complete resource 
pooling) in the more general setting of a stochastic processing network. In summary, 
under the complete resource pooling condition, the results in [6, 7, 30] imply that 
the performance of the maximum weight policy in an input-queued switch, or more 
generally in a stochastic processing network, is always optimal (in the diffusion limit, 
and when each queue size is approriately weighted). These results suggest that the 
average total queue size scales as 1/(1-/5) in the p — ?• 1 limit. However, such analyses 
do not capture the dependence on the network scheduling structure S. Essentially, 
this is because the complete resource pooling condition reduces the system to a one- 
dimensional space (which may be highly dependent on a network's structure), and 
optimality results are then initially expressed with respect to this one-dimensional 
space. 

Motivated to capture the dependence of the queue sizes on the network scheduling 
structure S, a heavy-traffic analysis of switched networks with multiple bottlenecks 
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(without resource pooling) was pursued by Shah and Wischik [28]. They estabhshed 
the so-cahed multiphcative state space cohapse, and identified a member, denoted by 
MW-O"*" (obtained by taking a — ?• 0) , of the class of maximum- weight policies as op- 
timal with respect to a critical fluid model. In a more recent work, Shah and Wischik 
[27] established the optimality of MW-O"*" with respect to overloaded fluid models 
as well. However, this collection of works stops short of establishing optimality for 
diffusion scaled queue-size processes. 

Finally, we take note of the work by Meyn [19] , which establishes that a class of 
generalized maximum weight policies achieve logarithmic (in 1/(1 — p)) regret with 
respect to an optimal policy under certain conditions. 

In a related model — the bandwidth-sharing network model — Kang et al. [15] 
have established a diffusion approximation for the proportionally fair bandwidth al- 
location policy, assuming a technical "local traffic" condition, but without assuming 
complete resource pooling^. They show that the resulting diffusion approximation 
has a product-form stationary distribution. Shah et al. [26] have recently established 
that this product-form stationary distribution is indeed the limit of the stationary 
distributions of the original stochastic model (an interchange-of-limits result). As 
a consequence, if one could utilize a scheduling policy in a switched network that 
corresponds to the proportionally fair policy, then the resulting diffusion approx- 
imation will have a product-form stationary distribution, as long as the effective 
network scheduling structure S (precisely {S)) satisfies the "local traffic condition". 
Now, proportional fairness is a continuous-time rate allocation policy that usually 
requires rate allocations that are a convex combination of multiple schedules. In 
a switched network, a policy must operate in discrete time and has to choose one 
schedule at any given time from a finite discrete set S. For this reason, proportional 
fairness cannot be implemented directly. However, a natural randomized policy in- 
spired by proportional fairness is likely to have the same diffusion approximation 
(since the fluid models would be identical, and the entire machinery of Kang et al. 
[15], building upon the work of Bramson [5] and Williams [36], relies on a fluid 
model). As a consequence, if S (more accurately, (5)) satisfies the "local traffic con- 
dition", then effectively the diffusion-scaled queue sizes would have a product-form 
stationary distribution, and would result in bounds similar to those implied by our 
results. In comparison, our results are non-asymptotic, in the sense that they hold 
for any admissible load, they have a product-form structure, and they do not re- 
quire technical assumptions such as the 'local traffic condition'. Furthermore, such 
generality is needed because there are popular examples, such as the input-queued 
switch, that do not satisfy the 'local traffic condition'. 

We note that Stolyar [29] and Venkataramanan and Lin [32] established that 

^Kang et al. [15] assume that critically loaded traffic is such that all the constraints are saturated 
simultaneously. 
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the maximum weight poHcy with weight parameter a > 0, MW-a, optimizes the tail 
exponent of the 1+a norm of the queue-size vector. However, it does not characterize 
the tail exponent explicitly. See [24] which has the best known explicit bounds on 
the tail exponent. 

In the context of input-queued switches, the example that has primarily motivated 
this work, the policy that we propose has the average total queue size bounded 
within factor 2 of the same quantity induced by any policy, in the heavy-traffic 
limit. Furthermore, this result does not require conditions like complete resource 
pooling. More generally, our policy provides non-asymptotic bounds on queue sizes 
for every arrival rate and switch size. The policy even admits exponential tail bounds 
with respect to the stationary distribution; and the exponent of these tail bounds is 
optimal. These results are significant improvements on the state-of-the-art bounds 
for best performing policies for input-queued switches. As noted in the introduction, 
our bound on the average total queue size is \/iV times better than the existing 
bound for the maximum-weight policy, and log A^/(l — p) times better than that for 
the batching policy in [21]. (Here N is the number of queues, and p the system load.) 
For more details of these results, see [25]. 

For a generic switched network, our policy induces average total queue size that 
scale linearly with the rank of (5), under the diffusion scaling. This is in contrast 
to the best known bounds, such as those for maximum weight policy, where the 
average queue-size scales as N, under the diffusion scaling. Therefore, whenever the 
rank of (S) is smaller than (the number of queues), our policy provides tighter 
bounds. Under our policy, queue sizes admit exponential tail bounds. The bound on 
the distribution of queue-sizes under our policy leads to an explicit characterization 
of the tail exponent, which is optimal for any single- hop switched network. 

4. Insensitivity in stochastic networks This section recalls the background 
on insensitive stochastic networks that underlies the main results of this work. 
We shall focus on descriptions of the insensitive bandwidth allocation in so-called 
bandwidth-sharing networks operating in continuous time. Justifications of claims 
made in this section are provided in the Appendix. 

We consider a bandwidth-sharing network operating in continuous time with ca- 
pacity constraints. The particular bandwidth-sharing policy of interest is the so- 
called "store-and- forward allocation (SFA)," introduced by Bonald and Proutiere 
[4]. We shall use the SFA as an idealized policy to design online scheduling policies 
for switched networks. We now describe the precise model, the SFA policy, and what 
we know about its performance. 

Model. Let time be continuous and indexed hy t £ M+. Consider a network with 
J > 1 resources indexed from 1, . . . , J. Let there be N routes, and suppose that 
each packet on route i consumes an amount Rji > of resource j, for each j S 
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{1,2,..., J}. Let K. be the set of all resource-route pairs (j, i) such that route i uses 
resource j, i.e., K, = ■ Rji > 0}. Without loss of generality, we assume that 

for each i G {1, 2, . . . , N}, Rji > 0. Let R be the J x N matrix with entries 

Rji. Let C G be a positive capacity vector with components Cj. For each route 
i, packets arrive as an independent Poisson process of rate Aj. Packets arriving on 
route i require a unit amount of service, deterministically. 

We denote the number of packets on route i at time t by Mi{t), and define the 
queue-size vector at time t by M(t) = [Mj(t)]^]^ G Z^. Each packet gets service 
from the network at a rate determined according to a bandwidth-sharing policy. 
Once a packet receives its total (unit) amount of service, it departs the network. 

We consider online, myopic bandwidth allocations. That is, the bandwidth allo- 
cation at time t only depends on the queue-size vector M(t). When there are rrii 
packets on route i, that is, if the vector of packets is m = [mil'^L^, let the total band- 
width allocated to route i be 0i(m) G M_|_. We consider a processor-sharing policy, 
so that each packet on route i is served at rate (m) /rrij , if rrii > 0. If rrii = 0, 
let (/'i(m) = 0. If the bandwidth vector 0(m) = [(j)i{ui)]^i satisfies the capacity 
constraints 

(11) 

R0(m) < C, component-wise, 

for all m G then, in light of Definition 2.2, we say that (f){-) is an admissible 
bandwidth allocation. A Markovian description of the system is given by a process 
X(t) which contains the queue-size vector M(t) along with the residual workloads 
of the set of packets on each route. 

Now, on average, Aj units of work arrive to route i per unit time. Therefore, in 
order for the Markov process X(-) to be positive (Harris) recurrent, it is necessary 
that 

(12) RA < C, component-wise. 

All such A = [Aj]^;^ G will be called strictly admissible, in the same spirit as the 
admissible region for a switched network. 

Store- and- Forward Allocation (SFA) policy. We describe the store-and-forward al- 
location policy that was first considered by Massoulie and later analysed in the thesis 
of Proutiere [22]. Bonald and Proutiere [4] established that it induces product-form 
stationary distributions and is insensitive with respect to phase-type distributions. 
This policy is shown to be insensitive for general service time distributions, includ- 
ing the deterministic service considered here, by Zachary [37]. The relation between 
this policy, the proportionally fair allocation, and multiclass queueing networks is 
discussed in depth by Walton [34] and Kelly et al. [16]. The insensitivity property 
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implies that the invariant measure of the process M(t) only depends on the param- 
eters A = [Aj]^]^ € M^, and no other aspects of the stochastic description of the 
system. 

We, first, give an informal motivation for SFA. SFA is closely related to quasi- 
reversible queueing networks. Consider a continuous-time multi-class queueing net- 
work (without scheduling constraints) consisting of processor sharing queues indexed 
by J £ and job types indexed by the routes i G {1,...,A^}. Each route i 

job has a service requirement Rji at each queue j, and a fixed service capacity Cj is 
shared between jobs at the queue. Here each job will sequentially visit all the queues 
(so called store-and-forward) and will visit each queue a fixed number of times. If 
we assume jobs on each route arrive as a Poisson process, then the resulting queue- 
ing network will be stable for all strictly admissible arrival rates. Moreover, each 
stationary queue will be independent with a queue size that scales, with its load p, 
as p/{l — p). For further details, see Kelly [17]. So, assuming each queue has equal 
load, the total number of jobs within the network is of the order Jp/ [1 — p). In other 
words, these networks have the stability and queue-size scaling that we require, but 
they do not obey the necessary scheduling constraints (11). However, these networks 
do emit an admissible schedule on average. For this reason, we consider SFA which, 
given the number of jobs on each route, allocates the average rate that jobs are 
transferred through this multi-class network. Next, we describe this policy (using 
notations similar to those used in [16, 34]). 

Given m G Z,^, define 

?7(m) = < m = {rhji : (j, i) G /C) G zlj^' : fhji = rrii for all 1 < -i < > . 

Here, by notation j £ i we mean Rji > 0. For each fh G [/(m), we exploit notation 
somewhat and define fhj = ^i-j^ifhji, for all j < J. Also define 



In the above, hy i 3 j we mean that Rji > 0; the notation i 3 j is used when we 
consider a collection of i satisfying this condition for a given j. For m G Z^, we 
define $(m) as 
(13) 

E nf(ft "'L,)n(§ 

m£U{m)jeJ y ^ ■ i:jei ^ ^ 

We shall define <I>(m) = if any of the components of m is negative. The store-and- 
forward allocation (SFA) assigns rates according to the function <p ■ ~^ so 
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that for any m G Z^, 0(m) = (0j(m))^]^, with 
(14) 

$(m - Gi 

0i(m) ^ 



$(m) ' 

where, recall that m — ej is the same as m at all but the ith component; its ith com- 
ponent equals rrii — l. The bandwidth allocation (/)(m) is the stationary throughput of 
jobs on the routes of a multi-class queueing network (described above), conditional 
on there being m jobs on each route. 

A priori it is not clear if the above described bandwidth allocation is even admis- 
sible (i.e., satisfies (11)). This can be argued as follows. The cf){m) can be related to 
the stationary throughput of a multi-class network with a finite number of jobs, m, 
on each route. Under this scenario (due to finite number of jobs), each queue must 
be stable. Therefore, the load on each queue, R0(m), must be less than the overall 
system capacity C. That is, the allocation is admissible. The precise argument along 
these lines is provided in, for example [16, Corollary 2] and [34, Lemma 4.1]. 

The SFA induces a product-form invariant distribution for the number of packets 
waiting in the bandwidth-sharing network and is insensitive. We summarize this in 
the following result. 

Theorem 4.1 Consider a bandwidth- sharing network with RA < C. Under the SFA 
policy described above, the Markov process X(t) is positive (Harris) recurrent and 
M(t) has a unique stationary probability distribution tt given by 
(15) 



7r(m) = ^n^r% for all me 



where 
(16) 



$=n 



is a normalizing factor. Furthermore, in steady state, the residual workload of each 
packet in the network is uniformly distributed on [0, 1] and is conditionally indepen- 
dent from the residual workloads of other packets, when we condition on the number 
of packets on each route of the network. 

Note that statements similar to Theorem 4.1 have appeared in other works, for 
example, [3], [34, Proposition 4.2] and [16]. Theorem 4.1 is a summary of these 
statements, and for completeness, it is proved in Appendix A. 

The following property of the stationary distribution tt described in Theorem 4.1 
that will be useful. 
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Proposition 4.2 Consider the setup of Theorem 4-1 md let n be as described by 

y\JC\ 



(15). Define a measure tt on z|^' as follows: for m G 



Then, for any L G Z_|_, 




RjiXi 

rhij : i B j J \ C, 



(18) 



Finally, we relate the distribution tt to the stationary distribution of an insensi- 
tive multiclass queueing network with a product-form stationary distribution and 
geometrically distributed queue sizes. 

Proposition 4.3 Consider the distribution tt defined in (17). Then, for any L = 
(Li,...,Lj) G 



n{fhi = Li, . . . ,fhj = Lj) = ^ 7v{{rhji)) 

J 

(19) =n^t'(i-^i)' 

w/iere = ( Ei:i9j ^ji\)/Cj. 

5. Main result: a policy and its performance In this section, we describe 
an online scheduling policy and quantify its performance in terms of explicit, closed- 
form bounds on the stationary distribution of the induced queue sizes. Section 5.1 
describes the policy for a generic switched network and provides the statement of 
the main result. Section 5.2 discusses its implications. Specifically, it discusses (a) 
the optimality of the policy for any switched network with respect to exponential 
tail bounds, and (b) the optimality of the policy for a class of switched networks, 
including input-queued switches, with respect to the average total queue size. Section 
5.3 proves the main result stated in Section 5.1. 

5.1. A policy for switched networks. The basic idea behind the policy, to be 
described in detail shortly, is as follows. Given a switched network, denoted by SN, 
with constraint set S and queues, let (S) have rank J and representation (cf. (6)) 

(S) = {x G [0, 1]^ : Rx < C}, R G R^""-^ , C G M^. 
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Now consider a virtual bandwidth-sharing network, denoted by BN, with N routes 
corresponding to each of these queues. The resource-route relation is determined 
precisely by the matrix R; and the J resources have capacities given by C. Both 
networks, SN and BN are fed identical arrivals. That is, whenever a packet arrives 
to queue i in SN, a packet is added to route i in BN at the same time. The main 
question is that of determining a scheduling policy for SN; this will be derived from 
BN. Specifically, the BN will operate under the insensitive SFA policy described in 
Section 4. Due to Theorem 4.1 as well as Propositions 4.2 and 4.3, this will induce 
a desirable stationary distribution of queue sizes in BN. Therefore, if we could use 
the rate allocation of BN, that is, the policy SFA, directly in SN, it would give us 
a desired performance in terms of the stationary distribution of the induced queue 
sizes. Now the rate allocation in BN is such that the instantaneous rate is always 
inside (S). However, it could change all the time and need not utilize points of S 
as rates. In contrast, in SN we require that the rate allocation can change only 
once per discrete time slot and it must always employ one of the generators of (5), 
that is, a schedule from S. The key to our policy is an effective way to emulate the 
rate allocation of BN under SFA (or for that matter, any admissible bandwidth 
allocation) by utilizing schedules from S in an online manner and with the discrete- 
time constraint. We will see shortly that this emulation policy relies on S being 
monotone (cf. Assumption 2.1). 

To that end, we describe this emulation policy. Let us start by introducing some 
useful notation. Let A(-) = (j4j(-)) be the vector of exogenous, independent Poisson 
processes according to which unit-sized packets arrive to both BN and SN, simul- 
taneously. Recall that Ai{-) is a Poisson process with rate Aj. Let ]V[(t) = (Mj(t)) 
denote the vector of numbers of packets waiting on the N routes in BN at time t > 0. 
In BN, the services are allocated according to the SFA policy described in Section 
4. Let A^^^(-) = (A^^^(-)) G denote the cumulative amount of service allocated 
to the N routes in BN under the SFA policy: A?^^(t) denotes the total amount 
of service allocated to all packets on route i during the interval [0,i], for t > 0, 
with A?^^(0) = for 1 < i < A^. By definition, all components of A^^^(-) are non- 
decreasing and Lipschitz continuous. Furthermore, (A^^^(t + s) — A^^^{t))/s G (S) 
for any t > and s > 0. Recall that the (right-)derivative of A^^^(-) is determined 
by M(-) through the function 0(-) as defined in (14). 

Now we describe the scheduling policy for SN that will rely on A^^^(-). Let 
B(t) = (i?i(r)) denote the cumulative amount of service allocated in SN by the 
scheduling policy up to time slot r > 0, with B(0) = 0. The scheduling policy 
determines how B(-) is updated. Let Q(r) = ((5i(r)) be the queue sizes measured 
at the end of time slot r. Let service be provided according to the scheduling policy 
instantly at the begining of a time slot. Thus, the scheduling policy decides the 
schedule d'B{T) = B(r + 1) — B(t) G 5 at the very beginning of time slot r + 1. 
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This decision is made as follows. Let D(r) = A (r) — B(r). Let /3(D(r)) be the 
optimal objective value in the optimization problem PRIMAL(D(r)) defined in (7). 
In particular, there exists a non- negative combination of schedules in S such that 

(20) a„cT > D(r), and a„ = p(D(r)). 

We claim that in fact, we can find non-negative numbers Oa-, cr £ S, such that 

(21) a^a = D(r), and J]] a^- = p(D(t)). 

cr6<S o-eS 

This is formalized in the following lemma. 

Lemma 5.1 Let B £ be a non-negative vector. Consider the static planning 
problem PRIMAL(D) defined in (7). Let the optimal objective value to PRIMAL(D) 
be p(D). Then there exist ol„ > 0, cr £ S , such that (21) hold. 

The proof of the lemma relies on Assumption 2.1, and is provided in the Appendix. 

There could be many possible non-negative combinations of D(r) satisfying (21). 
If there exists non-negative numbers 0,7, cr £ S, satisfying (21) with aa-' > 1 for some 
a' £ S, then choose cr' as the schedule: set (iB(r) = cr' . If no such decomposition 
exists for D(r), then set (iB(T) = ct, where is a solution (ties broken arbitrarily) 
of 

(22) maximize aj over a £ S, a < D(r). 

i 

Note that is a feasible solution for the above problem as G 5 and < D(t). 
Observe also that for all time r, dB(r) < D(r). 

The above is a complete description of the scheduling policy. Observe that it is 
an online policy, as the virtual network BN can be simulated in an online manner, 
and, given this, the scheduling decision in SN relies only on the history of BN and 
SN. The following result quantifies the performance of the policy. 

Theorem 5.2 Given a strictly admissible arrival rate vector A, with p{X) < 1, 
under the above described policy, the switched network SN is positive recurrent and 
has a unique stationary distribution. Let pj = ( RjiXj) /Cj, j = 1,2, J be as in 
Proposition 4-3. With respect to this stationary distribution, the following properties 
hold: 

1. The expected total queue size is bounded as 

N 1 / ~ \ 

(23) ^[Y.q] ^ 2 U^r^ 
i=i \j=i ) 

where K = maXo-e5 ( I^j • 
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2. The distribution of the total queue size has an exponential tail with exponent 
given by 

1 ^ 

(24) lim -logP( VQi > L) = max logp^. 

L— >CXD L ^ — ' j = l,...,J 

i=l 

5.2. Optimality of the policy. This section establishes the optimahty of our pohcy 
for input-queued switches, both with respect to expected total queue size scaling 
and tail exponent. The policy produces an optimal tail exponent for any single-hop 
switched network. 

Scaling of queue sizes. We start by formalizing what we mean by the optimality of 
expected queue sizes and of their tail exponents. We consider policies under which 
there is a well-defined limiting stationary distribution of the queue sizes for all A such 
that /j(A) < 1. Note that the class of policies is not empty; indeed, the maximum 
weight policy and our policy are members of this class. With some abuse of notation, 
let TV denote the stationary distribution of the queue-size vector under the policy of 
interest. We are interested in two quantities: 

1. Expected total queue size. Let Q be the expected total queue size under the 
stationary distribution tt, defined by 

i 

Note that by ergodicity, the time average of the total queue size and the 
expected total queue size under tt are the same quantity. 

2. Tail exponent. Let (3l{Q), Pu{Q) £ [-co, 0] be the lower and upper limits of the 
tail exponent of the total queue size under tt (possibly — oo or 0), respectively, 
defined by 

(25) Pl (Q) = lim inf ] log P„ ( V Q^ > ^) , 

i 

(26) and PuiQ) = Hm sup ] log ¥^{^Q,>i). 

I 

If Pl{Q) = Pu{Q)y then we denote this common value by P{Q). 

We are interested in policies that can achieve minimal Q and /3{Q). For tractability 
reasons, we focus on the scaling of these quantities with respect to S (equivalently, 
N) and /9(A), as 1/(1— p{X)) and N increase. Now, for different A' and A, it is possible 
that p(A) = /o(A'), but the scaling of Q, for example, could be wildly different. For 
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this reason, we consider the worst possible dependence on 1/(1 — p) and N among 
ah A with p(A) = p. 

Note that we are considering scahngs with respect to two quantities p and iV, 
and we are interested in two hmiting regimes /f — )• 1 and N ^ oo. The optimahty of 
average queue-size stated here is with respect to the order of hmits /O — )■ 1 and then 

— 7- oo. As noted in [25], taking the hmits in different orders could potentially result 
in different limiting behaviors of the object of interest, e.g., Q. For more discussions, 
see Section 6. It should be noted, however that the optimality of the tail exponent 
holds for any p and A^. 

Optimality of the tail exponent. Here we establish the optimality of the tail ex- 
ponent for any single-hop switched network under our policy. Consider any policy 
under which there exists a well-defined limiting stationary distribution of the queue 
sizes for all A such that p{X) < 1. Let ttq denote the stationary distribution of queue 
sizes under this policy. The optimality of the tail exponent under our policy is an 
immediate consequence of the following lemma. 

Lemma 5.3 Let tvq and A be as described. Let pi,...,pj be as defined in (4.3). 
Then under ttq, 



Proof. Recall that pj = (^^ Rji^i) jCj^ for j = 1, 2, . . . , J, under the representation 



Without loss of generality, suppose that p\ = maxj=i^2,...,j log • We now lower 
bound the total queue size stochastically by that of an M/D/1 queue. Consider 
an M/D/\ queue where the arrival rate is pi, and the service capacity has a de- 
terministic rate of 1. Since in the original network, this service capacity has to be 
shared among the queues, ^^Qi stochastically dominates this M/D/1 queue. Now 
the stationary distribution of this M/D/\ queue has a tail exponent logpi, which 
provides a lower bound on the same quantity in the original network, under ttq. □ 

Lnput-queued switches. Here we argue the optimality of our policy for input-queued 
switches. As discussed above, the scaling of tail exponent is optimal under our policy 
for any switched networks, and hence for input-queued switches. We would argue the 
optimal scaling of the average total queue size under our policy for input-queued 
switches. To that end, as argued in Shah et al. [25], when all input and output 
ports approach critical load, the average total queue size under any policy for input- 
queued switch must scale at least as fast as \/iV /(I — p), for any n-port switch with 
N = queues. For completeness, we include the proof for this lower bound here, 
as in Section 2.4, we use double indexing. 




{S) = {x G [0,1]^ : Rx < C} . 



imsart-aap ver. 2007/04/13 file: arxiv.tex date: October 24, 2011 



OPTIMAL SCHEDULING 



19 



Lemma 5.4 Consider a n-port input-queued switch, with an arrival rate vector 
A. Suppose that the loads on all input and output ports are p, i.e., ^^=i^k,t = 
Ylm ^(^m = P; foT' 0,^^ ^ ^ {1)2,..., n}, where p G (0, 1). Consider any policy under 
which the queue-size process has a well-defined limiting stationary distribution, and 
let this distribution be denoted by ttq. Then under ttq, we must have 



E. 



Qk,i 

k,e=i 



> 



np 



2(1 - p) 



Proof. We consider the sums of queue sizes at each output port, i.e., the quantities 
X]fc=i Qk,e for each i £ {1,2, . . . ,n}. Since at most one packet can depart at each time 
slot, ^^=iQk,£ stochastically dominates the queue size in an M/D/1 system, with 
arrival rate p and deterministic service rate 1. Therefore, for each i £ {1, 2, . . . , n}. 



E. 



.k=l 



> 



2(1 -p)- 



Here, 



is the expected queue size in steady state in an M/D/l system. Sum- 



ming over £ gives us the desired bound. 



□ 



The optimality in terms of the average total queue size is a direct consequence of 
Theorem 5.2 and Lemma 5.4. 

Corollary 5.5 Consider the same setup as in Lemma 5.4. Then in the heavy-traffic 
limit p — )• 1, our policy is 2-optimal in terms of the average total queue size. More 
precisely, consider the expected total queue size in the diffusion scale in steady state, 
i.e., (1 — p)Q. Then 

limsup(l — p)Q < n 



under our policy, and 

under any other policy. 

Proof. Lemma 5.4 implies that 



n 



liminf(l — p)Q > — 
p-)-i 2 



n 



liminf(l — p)Q > — 
p->i 2 

under any policy. For the upper bound, note that by Theorem 5.2, under our policy, 

J 



Q < 



2(1 -p) 



+ {N + 2)K. 
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For input-queued switches, J < 2n, as remarked in Section 5.2, = n^, and K = n. 
Therefore, we have that under our pohcy, the expected total queue size scales as 

(27) Q < + (n^ + 2)n. 

1 - p 

Now consider the steady-state heavy-traffic scaling (1 — /9)Q. We have that 

(28) {I- p)Q<n+{l- p){n^ + 2)n. 

The term (1 — p){'n? + 2)n goes to zero as p — 1, and hence under our policy, 

limsup(l — p)Q < n. 



Our policy is not optimal in terms of the average total queue size, in general switched 
networks. In cases where J » N, the moment bounds for the maximum-weight 
policy gives tighter upper bounds. For more discussions, see Section 6. 

5.3. Proof of Theorem 5.2. The proof is divided in three parts. The first part 
describes a sample-path- wise relation between Q(-) and M(-), which implies that 
Q(-) is essentially dominated by M(-) at all times. Note that this domination is a 
distribution-free statement. The second part utilizes this fact to establish the positive 
recurrence of the SN Markov chain. The third part, as a consequence of the first 
two parts, and using Theorem 4.1, establishes the quantitative claims in Theorem 
5.2. 



Part 1. Dominance. We start by establishing that the queue sizes Q(-) of SN are 
effectively dominated by the workloads W(-) of BN at all times. We state this result 
formally in Proposition 5.8, which is a consequence of Lemmas 5.6 and 5.7 below. 

Lemma 5.6 Consider the evolution of queue sizes in both BN and SN networks 
fed by identical arrival process. Initially, Q(0) = M(0) = 0. Let W(r) = (Wj(r)) 
denote the amount of unfinished work in all N queues under the BN network at 
time T. Then for any r > and 1 < i < N , 

(29) Qi{T) <Wi{T) + Di{T) <Mi{T) + Di{T), 

where D(r) = K^^^[t) — B(r) is as described in Section 5.1. 
Proof. Consider any i G {1, 2, . . . , A^} and r > 0. From (4), in SN, 

(30) Q^{T)= A, (r) -Bi{T)+Zi{T), 
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where Zi{T) is the cumulative amount of idhng at the ith queue in SN. In a similar 
manner, in BN, 

(31) W,{T)=MT)-Af^{T)+Z,{T), 

where .^j(r) is the cumulative amount of idling for the ith queue in BN. Since by 
construction, D(t) = A^^^(t) — B(t), and D(r) > 0, we have that 

(32) S,(r)<AfA(r) < B,{t) + D,{t). 

By definition, the instantaneous rate allocation to the ith queue satisfies -^Af^^(t) = 
if Wi{t) = (equivalents, if Mi{t) = 0) for any t > 0. Therefore, Zi(r) = 0. On 
the other hand, by Skorohod's map, 

ZiiT)= sup [B,is)-Ms)]^ 

0<S<T 

< sup [AfA(s)-A,(.)] + 

0<S<T 

(33) = Ziir). 
From (32) and (33), it follows that 

Q,(r) = Mt) - B,{t) + Z,{t) 

<Ai{T)-kf^{T) + D,{T)+Z,{T) 
<MT)-Kf^{T) + Di{T)+Z,{T) 

(34) =W,{T) + Di{T). 

Since the workload at the ith queue equals the total amount of unfinished work for 
all of the Mi{T) packets waiting at the ith queue, and since each packet has at most 
a unit amout of unfinished work, VFj(r) < Mj(T). □ 

Lemma 5.7 Let D(r) be as in Lemma 5.6. For all t > 0, p(D{t)) < N + 2. In 
particular, 

(35) Di{T) < K{N + 2), where i^ = maxVo-i. 

i i 

Proof. This result is established as follows. First, observe that D(0) =0 and there- 
fore yo(D(0)) = 0. Next, we show that p(D(t+1)) < p(D(r)) + l. That is, p(D(-)) can 
at most increase by 1 in each time slot. And finally, we show that it cannot increase 
once it exceeds + 1. That is, if /9(D(t)) > A^ + 1, then /9(D(r + 1)) < p(D(r)). 
This will complete the proof. 
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We start by establishing that p(D{-)) increases by at most 1 in unit time. By 
definition, 

D(t + 1) = ASFA(^ + 1) _B(t + 1) 

= ASFA(^) _ B(r) + (aSFA(^ + 1) _ aSFA(^) _ rfB(r) 

= B{t) + dA^^^ir) - dB{T) 
(36) = fD(r) - dBir)) + dAS^^(r), 



where dA^^^^r) = A^^^{t + 1) - A^^^{t). As remarked earher, dB^r) < D(r) 
component- wise. Therefore, by (10) it follows that 

p(D(r + 1)) < p(D(r) - dB{T)) + pidA^""^ (r)) . 

Note that p{dA^^^{T)) < 1 because the instantaneous service rate under SFA 
is always admissible. Since D(r) > D(r) — dB{T) > 0, any feasible solution to 
PRIMAL (D(r)) is also feasible to PRIMAL (D(r) - dB{T)) , and hence 

p(D(r)-dB(T)) <p(D(r)). 

Hence it follows that 

(37) p(D(T + l)) <p(D(t)) + 1. 

Next, we shaU argue that if p(D(r)) > iV + 1, then /?(D(t + 1)) < p(D(t)). To that 
end, suppose that p(D(r)) > iV + 1. Now p(d(^)) D(t) G (5). Note that (cS) is a 
convex set in a A^-dimensional space with extreme points contained in S. Therefore, 
by Caratheodory's theorem, p(i3(T-)) D("^) can be written as a convex combination of 

at most A'^ + 1 elements in S. That is, there exists Ofc > with "^^^i = 1, and 
o-'^ G 5, fc G {1, 2, . . . , AT + 1}, such that 

(38) ^Tv^D(r) = a^aK 

Therefore, there exists some k* G {1, 2, . . . , A^ + 1}, such that Uk* > 1/(A^ + 1). Since 
/3(D(t)) > A^ + 1, /9(D(r))afe* > 1. That is, D(r) can be written as a convex com- 
bination of elements from S with one of them, a^* , having an associated coefficient 
that satisfies p(D(T))afc* > 1, as required. In this case, we have 

N+l 

(39) Ti{T)-a^*= p{^{T))aka^ + {p{J^{T))ak>-l)a^\ 

k=l,k^k* 
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Therefore, 

(40) p(B{T)-a'^')<p{-D{T))-l. 



Our scheduling pohcy chooses such a schedule, i.e., a^* ; that is, dB{T) = a''*. 
Therefore, 

(41) D(r + 1) =D(t) -cr'^* +dASFA(r). 
By another application of (10) it follows that 

p(D(r + 1)) < p(D(r) - a'') + p(dASFA(^)) 
<p(D(t)) -1 + 1, 

(42) =p(D(t)), 

where again we have used the fact that p{dA^^^{T)) < 1, due to the feasibility of 
SFA pohcy and (40). This establishes that p(D(t)) < + 2 for ah r > 0. That is, 
for each r > 0, there exist Ua- > for all cr £ S, J2cr P(D(T))acr < A^ + 2 and 

(43) D(r)<^a^cr. 

cr 

Therefore, 

j;A(r)=D(r).l 

i 

cr 

<y i 

(44) <{N + 2)K, 

where K = maxo-g^ Yli ^i- This completes the proof of Lemma 5.7. □ 

Lemma 5.6 and 5.7 together imply the following proposition. 

Proposition 5.8 Let Q(-), W(-) and M(-) he as in Lemma 5.6. Then 
(45) 

TV Af N 

Qiir) < Wiir) + K{N + 2) < ^ M,(r) + K{N + 2), 

i=l 1=1 1=1 

where K = max^-g^ {Yh=i (Ji ) • 
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Proof. We obtain the bounds (45) by summing inequality (29) over i £ {1,2, ... , N}, 
and using the bound (35), . □ 

Part 2. Positive recurrence. We start by defining the Markov chain describing 
the system evolution under the policy of interest. There are essentially two systems 
that evolve in a coupled manner under our policy: the virtual bandwidth-sharing 
network BN and the switched network SN of interest. The two networks are fed by 
the same arrival processes which are exogenous and Poisson (and hence Markov) . The 
virtual system BN has a Markovian state consisting of the packets whose services 
are not completed, represented by the vector M(-), and their residual services. The 
residual services of Mj(-) packets queued on route i can be represented by a non- 
negative, finite measure /ii(-) on [0,1]: unit mass is placed at each of the points 
< si, . . . , Sj,/.(4) < 1 if the unfinished work of Afj(t) packets are given by < 

Sl,-- - , SMi{t) < 1- 

We now consider a Markovian description of the network SN in discrete time: let 
X(r) be the state of the system defined as 
(46) 

X(r) = (M(r),/x(r),Q(r),D(r)), 

where (M(r), /x(r)) represents the state of BN at time r, Q(r) is the vector of 
queue sizes in SN at time r and D(r) is the "difference" vector maintained by the 
scheduling policy for SN, as described in Section 5.1. Clearly, X(-) is Markov. Now 
X(-) G X where X is the product space 

X = X M{[0, 1])^ X X R^, 

where A^([0, 1]) is the space of all non-negative, finite measures on [0, 1]. We endow 
A^([0, 1]) with the weak topology, which is induced by the Prohorov's metric. This 
results in a complete and separable metric (Polish) space. The other spaces Z_|_ and 
M+ are endowed by obvious metrics (e.g., ii). The entire product space is endowed 
with metric that is maximum of the metrics on the component spaces. The resulting 
product space is Polish. Let Bx be the Borel ci-algebra of X. 

Given the Markovian description X(t) of SN, we establish its positive (Harris) 
recurrence in the following lemma. 

Lemma 5.9 Consider a switched network SN with a strictly admissible arrival rate 
vector X, with p(A) < 1. Let X(-) be as defined in Eq. (46). Then X(-) is positive 
recurrent. 

The proof of the lemma is technical, and is deferred to Appendix C. The idea is that 
the evolution of BN is independent of SN, and that BN is, on its own, positive 
recurrent. Hence, starting from any initial state, the Markov process (M(-), /2(-)) 
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that describes the evolution of BN, reaches the null state, i.e., (M(-), = at 
some finite expected time. Once BN reaches the null state, it stays at this state for 
a arbitrarily large amount of time with positive probability. By our policy, Q(-) and 
D(-) can be driven to within this time interval. This establishes that X(-) reaches 
the null state in finite expected time, and that X(-) is positive recurrent. 

Part 3. Completing the proof. The positive recurrence of the Markov chain 
X(-) implies that it possesses a unique stationary distribution and it is ergodic. Let 
W = X^i^i J where, similar to Lemma 5.6, Wi is the steady-state workload 

on queue i in BN. Define M similarly. By ergodicity, the time average of the total 
queue size equals the expected total queue size in steady state, i.e., Q, and similarly 
for W. Therefore, by Proposition 5.8, 



Q <W + K{N + 2) 



We now claim that 



First, we have that 



W < 



pj 



M < 



By Propositions 4.2 and 4.3, M is the sum of J independent geometric random 
variables, with parameters 1 — /ji, 1 — p2, • • • , 1 — pj- Hence, in fact, we have 

By Theorem 4.1, the individual residual workload in steady state is independent 
from the number of packets in the network, and is uniformly distributed on [0, 1]. 
Therefore, W = ^M, and the claim is proved. 

We now establish the tail exponent in (24). By Lemma 5.3, 

/3l(Q) > . max log pj, 

]=l,2,...,J 

SO we only need to show that 

/3c/(Q) < max log Pj, 

j=l,2,...,J 

where /3l(Q) and /3(/(Q) are defined in (25) and (26) respectively. 
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First note that (3u{Q) < /3[/(M), where M is the queue-size vector of the virtual 
system BN. This is because, by Proposition 5.8, 

N N 

5^g^(r) <5]M,(T) + K(iV + 2), 

i=l 1=1 

deterministically and for all times r. Thus, in steady state, X^^i Qi is upper bounded 
by Yl^^i Mi{T)+K (iV+2), deterministically. Since K{N+2) is a constant, YILi ^h{T)+ 
K{N + 2) and Y.i=i^'h{T) have the same tail exponent. This establishes that 
/3c/ (Q) < /3c/ (M). 

We now consider f3u(Js/i). As noted earlier, in steady state, M is the sum of J 
independent geometric random variables, with parameters 1 — pi, 1 — /32, . . . , 1 — pj. 
The following lemma states that the tail exponent of the sum of these J geometric 
random variables is upper bounded by maxj=i^2,....J log Pj. 

Lemma 5.10 Let M he the sum of J independent geometric random variables, with 
parameters 1 — pi, . . . , 1 — pj respectively, where pj G [0, 1) for all j G {1, 2, . . . , J}. 
Then we have 

lim sup — log P {M > t) < max log pj . 

l^oo t j=l,2,...,J 

The proof is provided in Appendix D. 
In conclusion, 

^C/(Q) < /3c/(M) = max logp^. 

j=l,2,...,J 

6. Discussion We presented a novel scheduling policy for a generic single-hop 
switched network model. The policy, in effect, emulates the so-called Store- and- 
forward (SFA) continuous-time bandwidth-sharing policy. The insensitivity property 
of SFA along with the relation of its stationary distribution with that of multi-class 
queueing network leads to the explicit characterization of the stationary distribution 
of queue sizes induced by our policy. This allows us to establish the optimality of our 
policy in terms of tail exponent for any single-hop switched network and that with 
respect to the average total queue size for a class of switched networks, including 
the input-queued switches. As a consequence, this settles a conjecture stated in [25]. 
On the technical end, a key contribution of the paper is creating a discrete-time 
scheduling policy from a continuous-time rate allocation policy, and this on its own 
may be of potential interest in other domains of applications. 

The switched network model considered here requires the arrival processes to 
be Poisson. However, this is not a major restriction, due to a Poissonization trick 
considered, for example in [9] and [14]: all arriving packets are first passed through 
a ' regular izer', which emits out packets according to a Poisson process with a rate 
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that lies between the arrival rate and the network capacity. This leads to the arrivals 
being effectively Poisson, as seen by the system with a somewhat higher rate — by 
choosing the rate of 'regularizer' so that the effective gap to the capacity, i.e., {1 — p), 
is decreased by factor 2. 

The scheduling policy that we propose is not optimal for general switched net- 
works. For example, in the context of ad-hoc wireless networks, in the independent- 
set model, there are as many constraints as the number of edges in the interference 
graph, which is often much larger than the number of nodes. Under our policy, the 
average total queue size would scale with the number of edges, whereas maximum- 
weight policy achieves a scaling with the number of nodes. 

There are many possible directions for future research. One direction is the search 
for low-complexity and optimal scheduling policies. In the context of input-queued 
switches, our policy has a complexity that is exponential in A^, the number of queues, 
because one has to compute the sum of exponentially many terms at every time in- 
stance. This begs the question of finding an optimal policy with polynomial complex- 
ity in N. One candidate is the MW-a policies, which has polynomial complexity, but 
their optimality appears difficult to analyze. Another possible candidate could be, 
as discussed in the introduction, a randomized version of proportional fairness. The 
relationship between SFA and proportional fairness is explored in [34], and indeed, 
in a certain sense, SFA converges to proportional fairness. The question remains 
whether (a version of) proportional fairness is optimal for input-queued switches. 

Another interesting direction to pursue has to do the analysis of different limiting 
regimes. We are interested in two limits: N — )• oo, and p — )• 1, where N is the 
number of queues, and p is the system load. Again, take the example of input-queued 
switches. In this paper, we have considered the heavy-traffic limit, i.e., p — t- 1, and 
show that our policy is optimal. However, if we take the limit N ^ oo, while keeping 
p fixed, then the average total queue size scales as N^^'^, whereas maximum- weight 
policy produces a bound of A'^. A more interesting question is in the regime where 
(1 — p)\/N remain bounded, and where — t- oo. In this regime, under our policy, 
under the maximum- weight policy, and under the batching policy in [21], the average 
total queue sizes all scale as N^^"^ . In [23], the authors device a policy that achieves 
A^''' scaling, for some 7 G [1,3/2). 
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APPENDIX A: PROPERTIES OF SEA 

This section proves results stated in Section 4, specifically Theorem 4.1, Propo- 
sitions 4.2 and 4.3. Eirst, we note that Propositions 4.2 and 4.3 are fairly easy 
consequences of Theorem 4.1, and their proofs are included for completeness. We 
then prove Theorem 4.1, as a consequence of the work of Zachary [37]. 

Proof, (of Proposition 4-2) To verify (18), we can calculate both sides of the equa- 
tion directly. Note that by definition, fhj = "^i-j^ifnji, so 
(47) 



TT < m 



i=i ' ' ^ ^ (i,'06/c 



On the other hand, 
N 



TT I < m 

2=1 
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N 



i=l 



1 r ^ 1 N J 

meL''(m) i=l i=i \ 



.Ul '-1=1 



AT 



-1 J 



i E E I E». = i n 



?nji -.iBj 



n 



({m: ^ „.j, 



m,,- = L 



})■ 



The equality (48) follows from the definition of tt given in (15), (49) follows from 
the definition of $(m) given in (13), (50) follows from the fact that for m G C/(m), 
J2j j&^ji — i £1, (51) follows from the fact that 



N 



mG 



E I E"^^ = ^'E 



y\I\ '-«=! 



E ""J* = ^ 

{i,i)6/C 



and (52) follows from the definition of tt given in (17). So, (47) and (52) together 
establish (18). □ 

Proof, (of Proposition 4-3) We can verify (19) directly. Indeed, 

7v{{mj = Lj : j = 1,2, J}) 

r N n J 



(53) 



(54) 



i E ^ E"^i* = ^i n 



J 



~ •)n 



in(E^' 
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Equality (53) follows from the definition of tt in (17). Equality (54) collects all terms 



We now provide justifications for Theorem 4.1. Consider a bandwidth-sharing 
network model as described in Section 4. Instead of having packets requiring a unit 
amount of service, suppose each route i packet has a service require that is inde- 
pendent identically distributed with distribution /Zj and mean 1. We note that such 
bandwidth-sharing networks are a special case of the processor-sharing (PS) queue- 
ing network model, as considered by Zachary [37] . In particular, a bandwidth-sharing 
network is a procesor-sharing network, where network jobs depart the network after 
completing service. General, insensitivity results for the bandwidth-sharing networks 
follow as a consequence of the work of Zachary [37]. 

Following Zachary [37], for i G {1, 2, . . . , N}, we define the probability distribution 
/ij to be the stationary residual life distribution of the renewal process with inter- 
event distribution /ij. That is, if fii has cumulative distribution function F, then fii 
has distribution function G given by 



Note that if the service requests are deterministically 1, i.e., //j is the distribution 
of the deterministic constant 1, then p,i is a uniform distribution on [0,1], for all 



Insensitive rate allocation. Consider a bandwidth-sharing network, as described 
above, with rate allocation </>(•). If the Markov process X(t) admits an invariant 
measure, then it induces an invariant measure tt on the process M(t). Such tt, 
when exists, is called insensitive if it depends on the statistics of the arrivals and 
service requests only through the parameters A = (Aj)^]^; in particular, it does not 
depend on the detailed service distributions of incoming packets. A rate allocation 
4>{-) = {(l>i{-))iLi is called insensitive if it induces an insensitive invariant measure tt 
on M(t). 

It turns out that if the rate allocation satisfies a balance property, then it is 
insensitive. 





i G {1,2,..., 



N}- 



imsart-aap ver. 2007/04/13 file: arxiv.tex date: October 24, 2011 



32 



SHAH-WALTON-ZHONG 



Definition A.l (Definition 1, [4]) Consider the bandwidth-sharing network just 
described. The rate allocation (/){■) is balanced if there exists a function <I> : — t- M+ 
with ^{0) = 1, and $(m) = for all m ^Z^, such that 
(56) 

</>i(m) = for all m e ,i e {1,2, . . . , N}. 

Bonald and Proutiere [3] proved that a balanced rate allocation is insensitive with 
respect to all phase-type service distributions. Zachary [37] showed that a balanced 
rate allocation is indeed insensitive with respect to all general service distributions. 
He also gave the characterization of the distribution of the residual workloads in 
steady state. 

Theorem A. 2 (Theorem 2, [37]) Consider the bandwidth-sharing network de- 
scribed earlier. A measure tt on is stationary for M(t) and is insensitive to 
all service distributions with mean 1, if and only if it is related to the rate allocation 
cj> as follows: 
(57) 

7r(m)(/)i(m) = 7r(m — ei)Ai, forallmGZ^, i G {1,2, . . . , N}, 

where we set 7r(m — ej) to be 0, if nii = 0. Consequently, tt is given by expression 
(58) 

N 

7r(m) = $(m) JjAf\ 

i=l 

Furthermore, if tt can be normalized to a probability distribution, then X(t) is 
positive recurrent, and the residual workload of each class-i packet in the network in 
steady state is distributed as Jli, and, in steady state, is conditionally independent 
from the residual workloads of other packets, when we condition on the number of 
packets on each route of the network. 

Note that Condition (56) and (57) are equivalent. Suppose that (f){-) satisfies (56), 
then an invariant measure tt is given by (58). Substituting Eq. (58) into Eq. (56) 
gives Eq. (57). Conversely, if Eq. (57) is satisfied, then we can just set <I>(m) = 
7r(m)/nr=i^r% and Eqs. (56) and (58) are satisfied. 

Proof, (of Theorem 4-1-) Theorem 4.1 is now a fairly easy consequence of Theorem 
A. 2 and A. 2. Consider a bandwidth-sharing network described in Section 4. The 
additional structures are the additional capacity constraints (11), and that arriving 
packets only require an unit amount of service, deterministically. The capacity con- 
straints (11) impose the necessary condition for stability, given by (12). Recall that 
all arrival rate vectors A that satisfy RA < C are called strictly admissible. 
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Consider the bandwith vector cf) as defined by (13) and (14). As remarked earlier, 
(f) is admissible, i.e., it satisfies the capacity constraints (11). It is balanced by defini- 
tion, and hence insensitive by Theorem A. 2 and A. 2. Thus, it induces an stationary 
measure tt on the queue-size vector M(t), given by (58). For a strictly admissible 
arrival rate vector A, the measure is finite, with the normalizing constant ^ given 
by (16). Hence, we can normalize tt to obtain the unique stationary probability 
distribution for M(t). 

Finally, using Theorem A. 2 and the fact that all service requests are determinis- 
tically 1, we see that the stationary residual workloads are all uniformly distributed 
on [0, 1] and independent. □ 

APPENDIX B: PROOF OF LEMMA 5.1 

We introduce an optimization problem PRIMAL'(A), which is similar to PRIMAL(A), 
and which is defined to be 



<Te<s 



(59) minimize < 

(60) subject to A = da-cr, 

a-es 

(61) aa- €R+, for all cr G 5. 

Clearly, a solution of the PRIMAL' (D) is a feasible solution for PRIMAL (D). 
Therefore, to prove the Lemma, it is sufficient to find {oi%)^^^ that is an optimal 
solution for PRIMAL(D) and satisfies '^^-^s ct^f^ = D- 

Let [a ) be an optimal solution to PRIMAL(D). Then 
\ / a-es 

^a>>D. 

If all the inequality constraints are tight, then there is nothing to prove. Therefore, 
suppose that 

0i = ^ a^ai > Di, 
a-eS 

for some i £ {1,2,..., N}. We now modify ( ) to reduce the 'gap' between 
Eo-GS^irO-i and Di. 

Indeed, since X^o-eS '^o-'^i > Di > 0, there is some cr £ S such that ai = 1, and 
a'^ > 0. Now let cr E 5 be such that = (Jk for all k ^ i, and let di = 0. Such a 
exists by Assumption 2.1. Let e = min (oo-, — Di) and define 

a„] ='S^a„cr — ea + ea-. 



a-eS 
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Then, it follows that 

and J2cr'^cr = Scr'^cr- repeating this procedure finitely many times, it follows 
that we can reach a solution to PRIMAL'(D) without changing the objective. This 
completes the proof of Lemma 5.1. 

APPENDIX C: PROOF OF LEMMA 5.9 

First we note that under the SFA policy, BN is positive recurrent, given that 
p(A) < 1, by Theorem 4.1. Starting from any initial state, it also has a strictly 
positive probability of reaching the null-state (M(-), = at some finite time. 
Since the evolution of the virtual system BN does not depend on that of SN, it is, 
on its own, positive recurrent. Next we argue the positive recurrence of the entire 
network state building upon this property of BN. 

Sufficient conditions to establish positive recurrence of a discrete time Markov 
chain X(r) on a Polish space X are given by (see, [1, pp. 198-202] and [10, Section 
4.2] for details): 

CI. There exists a bounded set A £ Bx such that 

(62) Ex [Ta] < oo, for any x G X 

(63) supExiT^] < oo. 

In above, the stopping time = inf{r > 1 : X(t) G A}; notation ]Px(') = 
P(-|X(0) =x) and Ex[-] =E[- |X(0) = x] . 
C2. Given A satisfying (62)-(63), there exists x* G X, finite i > 1 and 6 > such 
that 

(64) Px(X(£)=x*) > 5, for any xG^ 

(65) Px*(X(l)=x*) > 0. 

Next, we verify conditions CI and C2. Condition CI follows immediately from the 
following facts: (a) the BN is positive recurrent and hence (M(-), /x(-)) returns to 
state in finite expected time starting from any finite state; (b) D(-) is always bounded 
due to Lemma 5.7; and (c) Q(-) returns to the bounded set Qi{-) < K{N + 2) 
whenever M(-) = due to Lemma 5.6. Condition C2 can be verified for the null- 
state, X* = as follows: (a) (M(-), /2(-)) returns to the null state with positive 
probability; (b) given this, it remains there for further K{N + 2) + 1 time with 
strictly positive probability due to Poisson arrival process; (c) in this additional 
time K{N + 2) + 1, the Q(-) and D(-) are driven to 0. To see (c), observe that 
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when M(-) = 0, D(-) G Z^. By construction of our policy and Assumption 2.1 on 
structure of 5, it follows that if M(-) continues to remain 0, the Di{-) is reduced 
by at least unit amount till D(-) = 0; at which moment Q(-) reaches as well. Since 
X]j ^ii') — K{N + 2) by Lemma 5.7, it follows that M(-) need to remain for this 
to happen only for K{N + 2) + 1 amount of time. This completes the verification 
of the conditions CI and C2. Subsequently, we establish that the network Markov 
chain, represented by X(-), is positive recurrent. 

APPENDIX D: PROOF OF LEMMA 5.10 

Proof. Let Xj be independent geometric random variables with parameters 1 — Pj, 
j = 1, 2, . . . , J, and lei M = Xi + X2 + ■ ■ ■ + Xj. Consider some arbitrary £. For 
k > i, consider P(M = k). Then 

r{M = k) = P Xj = k 

ki,k2,...,kj j=l 

j=l ki,k2,...,kj j=l 

where the sum is over all non-negative integers ki,k2, ■ ■ ■ ,kj such that ki + k2 + ■ ■ . + 

kj = k. Consider the term kjW.j=i Pj^ ■ Suppose that p = m.ax.j=i^2,...,j Pj, 

then 

E Uf/i E 

ki,k2,...,kj j=l ki,k2,...,kj 

Now consider the number of J-tuples (fci, /c2, . . . , kj) such that ki + k2 + ■ ■ ■ + kj = k, 
and kj G Z,^ for all j. There are C^j^^^^) of these tuples by standard theories of 
integer partitioning, and hence 



ki,k2,---,kj j=l 

Therefore, 



J-1 



{M>i) = ^FiM = k)<ii{i-pj)^ry_^y 

k=e j=i k=e ^ ^ 
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where Li is some constant that only depends on J. A further manipulation gives 

llogP(M>^) < + + 

for some constants L2 and L3. Therefore, 

lim sup - log P (M > i) < log p = max log pj . 
1^00 i=i,2,...,J □ 
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